Aminet 1 (Walnut Creek)

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 1 (Walnut Creek) / Aminet - June 1993 [Walnut Creek].iso / aminet / util / misc / vcli52.lha / VoiceLibrary.doc < prev

Wrap

Text File | 1992-12-31 | 14KB | 398 lines

***************************************************************** Voice Recognition for the Amiga using an audio digitizer. Voice.library (Ver 6.4) by Richard Horne - December 1992 ***************************************************************** FUNCTION OFFSET DEFINITIONS _LVOLearn EQU -30 _LVORecognize EQU -36 _LVOAddVoiceTask EQU -42 _LVORemVoiceTask EQU -48 _LVOGainUp EQU -54 _LVOGainDown EQU -60 _LVORecDataAddress EQU -66 _LVORecMapAddress EQU -72 _LVOWordScore EQU -78 _LVOPickSampler EQU -84 _LVOSetVoicePri EQU -90 _LVOPickTimer EQU -96 ************************* FUNCTION DEFINITIONS ****************** >>>>> All variables are long words unless otherwise noted. <<<<<< NOTE: Voice.library is opened with a call to the exec OpenLibrary function. OpenLibrary can fail for one of three reasons: 1. The voice.library file is not available in the libs: directory or cannot be found. 2. The parallel port is busy. 3. Voice.library is currently opened and being used by another application program. ***************************************************************** NAME: Learn -- Learn a spoken phrase. OFFSET: -30 SYNOPSIS: MapAddress = Learn (MapBuffer, Text, Screen, SequenceNum, X, Y) d0 a0 a1 a2 d0 d1 d2 FUNCTION: The "Learn" function stores a frequency map of a spoken word or phrase. Each frequency map is made up of 72 long words of data plus a 16 byte header for the associated ASCII text (304 bytes total). "Learn" requires the user to reserve a MapBuffer in memory equal to the size of vocabulary desired (number of words) times 304 bytes. MapBuffer address is passed to "Learn" in a0. Address of a null terminated text string representing the word or phrase to be learned is passed to "Learn" in a1. The "Learn" function will open it's own window on the screen specified in a2 (use NULL for WBENCHSCREEN), at a position X, Y specified in d1 and d2. The user will then be prompted to speak the specified word or phrase to obtain three good digital samples. Internally, these three samples are analyzed for frequency content and transformed into a frequency map (304 bytes) which is stored in the MapBuffer according to the Sequence Number specified in d0. "Learn" returns the memory address within MapBuffer at which this particular frequency map is stored. If "Learn" is intentionally cancelled using the close gadget of the Learn Window, then a zero will be returned. "Learn" is called separately for each word or phrase in the vocabulary. After every word has been learned, MapBuffer will be filled with a sequence of frequency maps (each 304 bytes). Then the "Recognize" or "AddVoiceTask" functions can be called which will listen to the audio digitizer, compute a frequency map of incoming words compare them to the words in MapBuffer, and indicate by Sequence Number which word or phrase is the best match. The maximum number of words or phrases in the vocabulary is 64. Note that you must select an audio sampler (PerfectSound3, SoundMaster, or Generic) using the "PickSampler" function before using the "Learn" function. ***************************************************************** NAME: Recognize -- Recognize a spoken word or phrase. OFFSET: -36 SYNOPSIS: SequenceNum = Recognize (MapBuffer, SizeVocabulary, Resolution) d0 a0 d0 d1 FUNCTION: "Recognize" assumes that the user has learned a sequence of words or phrases using the "Learn" function. MapBuffer contains a sequence of frequency maps produced by "Learn" corresponding to each word or phrase in the vocabulary. Mapbuffer address is passed to "Recognize" in a0. Number of words or phrases in the vocabulary are passed to "Recognize" in d0. "Recognize" listens for an incoming word, computes it's frequency map, and compares this map to the sequence of maps contained in MapBuffer. The Sequence Number of the word or phrase in MapBuffer which is most similar to that of the incoming word is returned in d0. Note that the number "0" represents the first word, "1" the second, and so on. "Recognize" will operate at either high resolution (d1 = 0) or low resolution (d1 = 1). High resolution computes a frequency analysis of the incoming word or phrase at twice the number of points in time as low resolution. High resolution is somewhat better at word recognition, but takes almost twice the processing time. "Recognize" will return the following error codes if it cannot find a match. d0 = -1 if there is no match between the incoming frequency map and any of the maps in MapBuffer. d0 = -2 if the incoming word causes unacceptable digital clipping. Volume should be reduced by moving your microphone or by using the "GainDown" function. d0 = -3 if incoming word is too low in volume. Volume should be increased by moving your microphone or by using the "GainUp" function. d0 = -4 if the incoming sample is confused by extraneous noise. ***************************************************************** NAME: AddVoiceTask -- Initiate a separate task to recognize a spoken word or phrase. OFFSET: -42 SYNOPSIS: AddVoiceTask (MapBuffer, MsgPort, SizeVocabulary, Resolution) a0 a1 d0 d1 FUNCTION: "AddVoiceTask" is similar in function to "Recognize" except that here, a separate task is started under the Amiga multitasking operating system which listens for incoming words or phrases and returns messages to the user's Message Port indicating the Sequence Number of the frequency map in Mapbuffer which best matches the frequency map of the incoming word. MapBuffer address and Message Port address are passed to "AddVoiceTask" in a0 and a1. Number of words or phrases in the vocabulary are passed to "AddVoiceTask" in d0. "AddVoiceTask" will operate at either high resolution (d1 = 0) or low resolution (d1 = 1). High resolution computes a frequency analysis of the incoming word or phrase at twice the number of points in time as low resolution. High resolution is somewhat better at word recognition, but takes almost twice the processing time. The messages sent to MessagePort are designed to mimic shortened IDCMP messages with a im_Class = $0. Thus you can receive and process these messages at either an Intuition window IDCMP message port or at a custom message port of your own. Messages sent by this task are as follows. im_Code = Sequence number of frequency map in MapBuffer that best matches the frequency map of the incoming word or phrase. im_Code = -1 if there is no match between the incoming frequency map and any of the maps in MapBuffer. im_Code = -2 if the incoming word causes unacceptable digital clipping. Volume should be reduced by moving your microphone or by using the "GainDown" function. im_Code = -3 if incoming word is too low in volume. Volume should be increased by moving your microphone or by using the "GainUp" function. im_Code = -4 if the incoming sample is confused by extraneous noise. Upon calling "AddVoiceTask", the PerfectSound digitizer becomes immediately active, listening for an incoming word. After receipt of a word or phrase, a message as described above is sent to Message Port. The VoiceTask then goes into a WAIT mode and remains inactive until it receives a reply to the message it has sent to Message Port. Upon receipt of a reply, VoiceTask again becomes active and listens for an incoming word. The priority of this task will be 127 for fastest possible voice recognition. You may change this priority to a lower value with the "SetVoicePri" function. ***************************************************************** NAME: RemVoiceTask -- Remove task initiated by AddVoiceTask OFFSET: -48 SYNOPSIS: RemVoiceTask () FUNCTION: Deallocates memory and removes VoiceTask from the Amiga system. Note that the Message Port specified for the "AddVoiceTask" function must still exist at the time you call "RemVoiceTask". Also you must reply to all outstanding messages from VoiceTask BEFORE calling this function. ***************************************************************** NAME: GainUp -- Increase gain of PerfectSound 3 audio digitizer. OFFSET: -54 SYNOPSIS: GainUp() FUNCTION: Increases gain of the PerfectSound audio digitizer by one step. Note that when gain reaches maximum, "GainUp" will wrap around and return gain to it's lowest value. Do not call this function if you are using the SoundMaster audio digitizer. ***************************************************************** NAME: GainDown -- Decease gain of PerfectSound 3 audio digitizer. OFFSET: -60 SYNOPSIS: GainDown() FUNCTION: Decreases gain of the PerfectSound audio digitizer by one step. Note that when gain reaches minimum, "GainDown" will wrap around and return gain to it's highest value. Do not call this function if you are using the SoundMaster audio digitizer ***************************************************************** NAME: RecDataAddress -- Return memory address of digital sample of incoming word or phrase. OFFSET: -66 SYNOPSIS: Address = RecDataAddress() d0 FUNCTION: When an incoming word or phrase is digitized, 3/4 second of digital data is stored in an internal buffer. This is 8 bit digitized data is sampled at a rate of 6400 Hz. Thus the buffer for storing this data is 4800 bytes in size. This function returns the address of this buffer for possible additional experimental uses. ***************************************************************** NAME: RecMapAddress -- Return memory address of frequency map of incoming word or phrase. OFFSET: -72 SYNOPSIS: Address = RecMapAddress() d0 FUNCTION: A frequency map of each incoming word or phrase is computed for comparison with maps learned and stored in MapBuffer. Each map consists of a frequency analysis of 3/4 second of audio data at 72 points in time. For each of these 72 time points, the data is examined for frequency content at 32 points between 0 Hz and 3200 Hz. A frequency map is made up of 72, 32 bit words corresponding to the 72 time points analyzed. For each of these 32 bit words, bit 0 is set if the signal contains frequency components from 0-100 Hz. Bit 1 is set if the signal contains frequency components from 100-200 Hz. Bit 2 is set if the signal contains frequency components from 200-300 Hz etc. This function returns the address of this frequency map for possible additional experimental uses. Note that this internal frequency map does not have the 16 byte ASCII header as do the frequency maps stored in MapBuffer. ***************************************************************** NAME: WordScore -- Return recognition score of a recognized word. OFFSET: -78 SYNOPSIS: Value = WordScore() d0 FUNCTION: The "Recognize" function computes a numerical score representing the "goodness" of a match between the frequency map of an incoming word and each frequency map stored in MapBuffer. The recognized word is determined by highest score. This function returns the score value for the recognized word. Internally, a score of #2000 must be achieved in order for a match to be declared. If you wish to have a higher match score threshold to reduce false matches, you may call "WordScore" after each word is recognized and set your own higher score threshold before accepting a match. Increasing the match score threshold will reduce false matches, but will also decrease recognition performance. ***************************************************************** NAME: PickSampler -- Specify which model audio sampler to use (either PerfectSound3, SoundMaster, or Generic). OFFSET: -84 SYNOPSIS: PickSampler (SamplerID) d0 FUNCTION: Select the audio sampler to be used with this function. SamplerID = 0 for PerfectSound3. SamplerID = 1 for SoundMaster. SamplerID = 2 for Generic Sampler. You only need to PickSampler once. However, you should PickSampler before you Learn, Recognize, or AddVoiceTask. ***************************************************************** NAME: SetVoicePri -- Set the multitasking priority of a voice recognition task that has been started by the "AddVoiceTask" function. OFFSET: -90 SYNOPSIS: Old Priority = SetVoicePri (New Priority) d0 d0 FUNCTION: When "AddVoiceTask" is called, a voice recgnition task of priority 127 is started for the fastest possible voice recognition. You may modify this priority by setting New Priority to any value between -128 and 127 and calling "SetVoicePri" which changes task priority to the new value and returns the value of the old task priority. "AddVoiceTask" must be called before "SetVoicePri." ***************************************************************** NAME: PickTimer -- Select either Timer A or Timer B of the CIA B for use in timing digital audio samples. OFFSET: -96 SYNOPSIS: PickTimer(TimerID) d0 FUNCTION: Voice.library uses CIA B Timer B by default for setting the time interval between digital audio samples. You may find situations where other applications require Timer B, causing a conflict. Use this function to choose either Timer B or Timer A as required. TimerID = 0 for selection of Timer B. TimerID = 1 for selection of Timer A. You only need to PickTimer once. However, you should PickTimer before you Learn, Recognize, or AddVoiceTask.